Semi-supervised model adaptation for statistical machine translation
نویسندگان
چکیده
منابع مشابه
Bayesian Semi-Supervised Chinese Word Segmentation for Statistical Machine Translation
Words in Chinese text are not naturally separated by delimiters, which poses a challenge to standard machine translation (MT) systems. In MT, the widely used approach is to apply a Chinese word segmenter trained from manually annotated data, using a fixed lexicon. Such word segmentation is not necessarily optimal for translation. We propose a Bayesian semi-supervised Chinese word segmentation m...
متن کاملModel Adaptation for Statistical Machine Translation
Statistical machine translation (SMT) systems use statistical learning methods to learn how to translate from large amounts of parallel training data. Unfortunately, SMT systems are tuned to the domain of the training data and need to be adapted before they can be used to translate data in a different domain. First, we consider a semi-supervised technique to perform model adaptation. We explore...
متن کاملSemi-Supervised Learning for Neural Machine Translation
While end-to-end neural machine translation (NMT) has made remarkable progress recently, NMT systems only rely on parallel corpora for parameter estimation. Since parallel corpora are usually limited in quantity, quality, and coverage, especially for low-resource languages, it is appealing to exploit monolingual corpora to improve NMT. We propose a semisupervised approach for training NMT model...
متن کاملSemi-supervised learning for Machine Translation
Statistical machine translation systems are usually trained on large amounts of bilingual text which is used to learn a translation model, and also large amounts of monolingual text in the target language used to train a language model. In this chapter we explore the use of semi-supervised methods for the effective use of monolingual data from the source language in order to improve translation...
متن کاملAnalysis of translation model adaptation in statistical machine translation
Numerous empirical results have shown that combining data from multiple domains often improve statistical machine translation (SMT) performance. For example, if we desire to build SMT for the medical domain, it may be beneficial to augment the training data with bitext from another domain, such as parliamentary proceedings. Despite the positive results, it is not clear exactly how and where add...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Machine Translation
سال: 2007
ISSN: 0922-6567,1573-0573
DOI: 10.1007/s10590-008-9036-3